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An information theoretic measure is derived that quantifies 
the statistical coherence between systems evolving in time. 
The standard time delayed mutual information fails to dis- 
tinguish information that is actually exchanged from shared 
information due to common history and input signals. In our 
new approach, these influences are excluded by appropriate 
conditioning of transition probabilities. The resulting transfer 
entropy is able to distinguish driving and responding elements 
and to detect asymmetry in the coupling of subsystems. 



The time evolution of a system may be called irregu- 
lar if it generates information at a non-zero rate. For 
stochastic or deterministically chaotic systems, this is 
quantified by the entropy. For a system consisting of 
more than one component, important information on its 
structure can be obtained by measuring to which ex- 
tent the individual components contribute to information 
production and at what rate they exchange information 
among each other. This paper proposes a method to 
answer the latter question on the basis of time series ob- 
servations. 

Many authors have used mutual information Q to 
quantify the overlap of the information content of two 
(sub-) systems. Unfortunately, mutual information nei- 
ther contains dynamical nor directional information. In- 
troducing a time delay in one of the observations is an 
important, if somewhat arbitrary, improvement in this 
respect, but still does not explicitly distinguish informa- 
tion that is actually exchanged from that due to the re- 
sponse to a common input signal or history. 

The purpose of this paper is to motivate and derive an 
alternative information theoretic measure, to be called 
transfer entropy, that shares some of the desired prop- 
erties of mutual information but takes the dynamics of 
information transport into account. With minimal as- 
sumptions about the dynamics of the system and the 
nature of their coupling one will be able to quantify the 
exchange of information between two systems, separately 
for both directions, and, if desired, conditional to com- 
mon input signals. 

This work may be seen in the context of a consider- 
able number of recently proposed measures for the 
nonlinear coherence of signals, used to study generalized 
synchronization phenomena in many contects, most no- 
tably in physiological systems. While these measures are 
often very powerful for a specific set of applications, it 



is also important to aim at an understanding of the un- 
derlying theoretical concepts. In the generic case that 
neither of the systems, nor their coupling may be as- 
sumed to be deterministic, information theory seems to 
be an appropriate starting point. 

Let us briefly recall the most basic concepts of infor- 
mation theory. The average number of bits needed to 
optimally encode independent draws of the discrete vari- 
able I following a probability distribution p(i) is given 
by the Shannon entropy fl 



Hj 



E 



where the sum extends over all states i the process can 
assume. The base of the logarithm only determines the 
units used for measuring information and will be dropped 
henceforth. 

In order to construct an optimal encoding that uses 
just as many bits as given by the entropy, it is nec- 
essary to know the probability distribution p(i). The 
excess number of bits that will be coded if a differ- 
ent distribution q(i) is used instead of p(i) is given by 
the Kullback entropy |3| Kj — £V p(i) log p(i)/q(i). 
We will later also need the Kullback entropy for con- 
ditional probabilities p(i\j). For a single state j we have 
Kj = J2i P{Af) 1°S P{Ai) I ■ Summation over j with 
respect to p(j) yields 



Ki\J = E vihi) log 



p(i\j) 



(1) 



The mutual information of two processes / and J 
with joint probability Pu(i,j) can be seen as the ex- 
cess amount of code produced by erroneously assuming 
that the two systems are independent, i.e. assuming 
Qu(i,j) = Pi(i)pj(j) instead of pu(i,j). The corre- 
sponding Kullback entropy is 



Mij = P{i,j) log 



P(hj) 
P(i) P(j) 



(2) 



which is the well known formula for the mutual infor- 
mation. Here and in the following, we omitted the 
summation index and the subscript of the probabilities 
specifying the process. This derivation shows that mu- 
tual information is a natural way to quantify the de- 
viation from independence of two processes. We have 
AIjj = Hj + Hj — Hi j > 0. Note that Mjj is symmetric 
under the exchange of / and J and therefore does not 
contain any directional sense. 
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A related, non-symmetric quantity is the conditional 
entropy Hj\j = - p{hj) log p(i\j) = H u -Hj. How- 



ever, since Hnj — H 



Hi — Hj, it is non-symmetric 



only due to the different individual entropies and not due 
to information flow. Mutual information can be given a 
directional sense in a somwhat ad-hoc way by introduc- 
ing a time lag in either one of the variables and compute 
e.g. 



Mu(t) 



P(injn-r) log 



Pi^ni jn — r ) 

P{i) P{j) 



As we will see below, considering the two systems at dif- 
ferent times occurs naturally as soon as transition prob- 
abilities are introduced. This will yield a more justified 
approach to measuring information transfer that explic- 
itly incorporates directional, dynamical structure. 

One can incorporate dynamical structure by studying 
transition probabilities rather than static probabilities. 
Consider a system that may be approximated by a sta- 
tionary Markov process of order k, that is, the condi- 
tional probability to find / in state i n +i at time n + 1 
observes p(i n +i\i n , ■■■ ,»n-k+i) = p(in+i\i n , ■ ■ ■ ,i n -k)- 

(k) 

Henceforth we will use the shorthand notation i„ — 
{i n , . . . , i n -k+i) for words of length k, or k dimensional 
delay embedding vectors. 

The average number of bits needed to encode one addi- 
tional state of the system if all previous states are known 
is given by the entropy rate 



^ P {i n+1 ,i n k) ) log P (i n+1 |4 fc) ). 



(3) 



Since p(i n+ i|i! fc) ) = pii^+i )/p(in'), this is just the dif- 
ference between the Shannon entropies of the processes 
given by k + 1 and k dimensional delay vectors con- 
structed from /: hi — H^k+i) — H^k) . 

If I is obtained by coarse graining a continuous sys- 
tem X at resolution e, the entropy Hx{e) and entropy 
rate hx (e) will depend on the partitioning and in general 
diverge like log e when e — > 0. However, for the special 
case of a deterministic dynamical system, lim^o hx (e) = 
hxs may exist and is then called the Kolmogorov-Sinai 
entropy. (For non-Markov systems, also the limit k — > oo 
needs to be taken.) Confusingly, the opposite is true 
for the mutual information. For generic noisy interde- 
pendence, lim e ^o Mxy (e) is finite and independent of 
the partition, but for deterministically coupled processes, 
Mxy(c) will diverge as e — > 0. 

The Shannon entropy and its generalization, the mu- 
tual information, are properties of the static probability 
distributions while the dynamics of the processes is con- 
tained in the transition probabilities. For the study of 
the dynamics of shared information between processes 
it is therefore desirable to generalize the entropy rate, 
rather than Shannon entropy, to more than one system. 
In the next section I will propose such generalizations, in 



particular one that is non-symmetric under the exchange 
of the two processes. 

The most straightforward way to construct a mutual 
information rate by generalizing hi to two processes 
(/, J) is again by measuring the deviation from indepen- 
dence. The corresponding Kullback entropy is sometimes 
called transinformation and is still symmetric under the 
exchange of I and J. It is therefore preferable to measur- 
ing the deviation from the generalized Markov property 
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In the absence of information flow from J to /, the state 
of J has no influence on the transition probabilities on 
system /. The incorrectness of this assumption can again 
be quantified by a Kullback entropy (Q) by which we 
define the transfer entropy: 

( . ,.(fc) .(Z)n 

Tj^t = E iKWi.flU ) ^g P(ln ; l]tn ] ; fc " } ■ (4) 

This is the central concept of this paper. The most nat- 
ural choices for I are I = k or I = 1. Usually, the latter 
is preferable for computational reasons. Tj^i is now ex- 
plicitly non-symmetric since it measures the degree of 
dependence of / on J and not vice versa. 

For coarse grained states {I, J) of continuous systems 
{X,Y), the limit lim^o Ty-of(e) is finite and indepen- 
dent of the partition, except for the case of deterministic 
coupling, when Ty^x(e) diverges as e — * 0. In this re- 
spect, transfer entropy behaves like mutual information. 
If computationally feasible, the influence of a known com- 
mon driving force Z may be excluded by conditioning the 
probabilities under the logarithm to z n as well. 

For numerical and practical applications, the limit e — > 
is not obtainable and has to be replaced appropriately. 
Either one can study transfer entropy as a function of the 
resolution, or one can fix a resolution for the scope of a 
study. Furthermore, there are several methods of coarse 
graining and a partition consisting of a fixed mesh of 
boxes is not always the best choice. Fixed boxes are only 
suitable in cases where data can be produced with little 
effort and small statistical errors at reasonable speed of 
computation are desired. 

For time series applications, an alternative implemen- 
tation using generalized correlation integrals is prefer- 
able. Mutual information and redundancies have been 
generalized for their estimation by order q correlation in- 
tegrals Q. It is possible to follow the same arguments 
in generalizing transfer entropy. However, for the com- 
putationally most attractive case q = 2, we would have 
to give up positivity of Ti^j. Instead, we propose an 
implementation of the definition (|j) where the probabil- 
ity measure p(i n +i,i n , jn) is realized by a sum over all 
available realizations of (x n +i,Xn , , Vn 1 ) m a time series. 
The transition probabilities are expressed by joint prob- 
abilities and then obtained by kernel estimation, e.g. 
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FIG. 1. Transfer entropy T jm -i_ jm for the coupling di- 
rection as a function of the coupling strength e in a tent map 
lattice (binary partition). Errorbars: error of the mean of 10 
runs of 100000 iterates. Line: theoretical curve a 2 e 2 /ln(2) 
with fitted a = 0.77. 



o r (x n+1 ,x n ,y n ) = — ^6 J 
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We use the step kernel Q(x > 0) = 1; 0{x < 0) = 0. 
The norm | • | can be simply the maximum distance but 
other norms and kernels can be considered. In particular, 
different overall scales of X and Y can be accounted for 
by using appropriate weights. Similarly to standard di- 
mension and entropy calculations, fast neighbour search 
strategies are advisable for all but the smallest data 
sets. Dynamically correlated pairs should be excluded 
as usual. Since these technical issues are the same as 
in many nonlinear time series methods, the reader is re- 
ferred to the discussion in the literature Q . 

In order to demonstrate the use of transfer entropy, let 
us study three examples, two spatio-temporal systems 
and a bi-variate physiological time series. In a one di- 
mensional lattice of unidirectionally coupled maps 



fiex™- 1 + (1 - e)x 



n ) ? 



(5) 



information can be transported only in the direction of 
increasing m. One of the simplest cases is given by the 
tent map, f(x < 0.5) = 2x; f(x > 0.5) = 2 - 2x. Let 
us study coarse grained states I m with i% defined by a 
partition at Xq — 0.5. At zero coupling, all static and 
transfer probabilities are equal to 1/2, M(t) = for 
all values of t, and also Tjm-i^jm = T Jm ^ Im -i = 0. 
For nonzero coupling, we still have Tjm^jm-i = 0, but 
Trm-i_,jm becomes positive. For small coupling, it can 
be assumed that the invariant density at a single site 
is essentially unchanged whence the transition probabili- 
ties p(I™ + i \I™, I r n~ V ) are changed by an amount propor- 
tional to e. In particular, p(0|0, 0), p(0|l, 1), p(l|0, 1), and 
p(l|l,0) are increased by a factor 1 + ae with a — O(l). 
All others are decreased by that amount. Evaluating 
([|) in lowest order of e with k = I = 1, we obtain 
Tjm-i^jm — a 2 e 2 / ln(2)+(9(e 4 ). For this particular case, 



the changes in p(i 
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exactly cancel out and the 



FIG. 2. Transfer entropies T X m-i^ X m and T X m_> X m-i 
(solid lines) and time delayed mutual infor- 
mation M x \ x i(t = 1) and M x i x \(t — 1) (dashed lines) 
as functions of the coupling strength e for a unidirectionally 
coupled Ulam lattice. For both quantities, the upper line de- 
notes the direction X m ~ 1 — > X m while the lower line shows 

X m + 1 Although the l attice under 

goes a sequence of 

bifurcations, the transfer entropy T clearly reflects the unidi- 
rectional character of the coupling. It also consistently out- 
performs the time delayed mutual information in this respect. 
See text for further details. 



mutual information is zero. Figure [l] shows a numerical 
verification of these results for a spatially periodic lattice 
of 100 maps. Averages of 10 runs of 10 5 iterates after 10 5 
transients are shown. The transfer entropy Tj m ^j m -i 
and both directions of M(t = 1) were found consistent 
with zero and are therefore not shown. 

The situation is more complicated for the Ulam map 
f(x) = 2 — x 2 and non-small coupling. For each coupling, 
a bi-variate time series was generated using a lattice 
of 100 points (random initial conditions) and recording 
10000 iterates of x\ and x\ after 10 5 steps of transients. 
Correlation sums at r = 0.2 were used to compute mu- 
tual information in both directions, M x i t x 2 ( T — 1) an d 
M x z x 1 i T — 1)j as wen as transfer entropies T x i-,x 2 an d 
Tx 2 ^x 1 with k = I = 1. Neighbors closer in time than 
100 iterates were excluded from the kernel estimation. 

Figure || shows M and T as functions of the coupling 
strength. Both M and T are able to detect the anisotropy 
since the information is consistently larger in the positive 
direction. The lattice undergoes a number of bifurcations 
when the coupling is changed. Around e = 0.18, the 
asymptotic state is of temporal and spatial period two. 
For this case, the mutual information is found to be 1 bit. 
This is correct although information is neither produced 
nor exchanged and reflects the static correlation between 
the sites. The transfer entropy finds a zero rate of in- 
formation transport, as desired. Around this pariodic 
window, the mutual information is non-zero in both di- 
rections and the signature of the unidirectional coupling 
is less pronounced. Around e = 0.82, the lattice settles 
to a (spatially inhomogenious) fixed point state. Here 
both measures correctly show zero information transfer. 
The most important finding, however, is that the trans- 
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FIG. 3. Bi-variate time series of the breath rate (upper) 
and instantaneous heart rate (lower) of a sleeping human. The 
data is sampled at 2 Hz. Both traces have been normalized 
to zero mean and unit variance. 
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FIG. 4. Transfer entropies T(heart — > breath) (solid line), 
T(breath — > heart) (dotted line), and time delayed mutual in- 
formation M(t = 0.5 s) (directions indistinguishable, dashed 
lines) for the physiological time series shown in Fig. pi 

fer entropy for the negative direction remains consistent 
with zero for all couplings, reflecting the causality in the 
system. 

As a last example, take a bi-variate time series (see 
Fig. ||) of the breath rate and instantaneous heart rate of 
a sleeping human suffering from sleep apnea (part of data 
set B of the Santa Fe Institute time series contest held in 
1991 Figure [I| shows that while time delayed mutual 
information is almost symmetric between both series, the 
transfer entropy indicates a stronger flow of information 
from the heart rate to the breath rate than vice versa 
over a significant range of length scales r. Note that for 
small r, the curves deflect down to zero due to the finite 
sample size. This result is consistent with the observa- 
tion that the patient breathes in bursts which seem to 
occur whenever the heart rate crosses some threshold. 
Certainly, both signals could instead be responding to a 
common external trigger. 

In conclusion, the new transfer entropy is able to de- 
tect the directed exchange of information between two 
systems. Unlike mutual information, it is designed to 
ignore static correlations due to the common history or 



common input signals. Most prominent applications in- 
clude multivariate analysis of time series and the study 
of spatially extended systems. 

Several authors fij have proposed to use time delayed 
mutual information M(Al, r) as a function of spatial dis- 
tance Al and temporal delay t to define a velocity of 
information transport in spatio-temporal systems. Of- 
ten, one finds that M(AI,t) for fixed Ai reaches a local 
maximum at some lag r*. Hence a velocity can be de- 
fined by the ratio Az/t*, in particular if that ratio is 
fairly constant over the resolvable range of values for Ai. 
This reasoning has been challenged |sj by giving an exam- 
ple where the above interpretation implies super-luminar 
communication. In fact, much of the common informa- 
tion is due to the common history that allows the lat- 
tice to partially synchronize. Preliminary results indicate 
that appropriate conditioning for the common history by 
replacing time delayed mutual information by a variant 
of Eq.(|j) resolves this aparent paradox. However, condi- 
tioning with respect to a large number of variables poses 
immense numerical problems whence this study will be 
concluded at a later time. 

Part of this work has been supported by the SFB 237 
of the Deutsche Forschungsgemeinschaft. 
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